Search CORE

25 research outputs found

XeonPhi Meets Astrophysical Fluid Dynamics

Author: Evghenii Gaburov
Publication venue
Publication date
Field of study

This white paper reports on ours e orts to optimize a 2D/3D astrophysical (magento-)hydrodynamics Fortran code for XeonPhi. The code is parallelized with OpenMP and is suitable for execution on a shared memory system. Due to complexity of the code combined with immaturity of compiler we were unable to stay within the boundaries of Intel Compiler Suite. To deliver performance we took two di erent approaches. First, we optimized and partially rewrote most of the bandwidth-bound Fortran code to recover scalability on XeonPhi. Next, we ported several critical compute- bound hotspots to Intel SPMD Program Compiler (ISPC), which o ers performance portability of a single source code across various architectures, such as Xeon, XeonPhi and possibly even GPU. This approach allowed us to achieve over 4x speed-up of the original code on dual-socket IvyBridge EP, and over 50x speed-up on the XeonPhi coprocessor. While the resulting optimized code can already be used in production to solve speci c problems, we consider this project to be a proof-of-concept case reecting the diculty of achieving acceptable performance from XeonPhi on a "home-brewed" application

ZENODO

Sapporo2: A versatile direct $N$ -body library

Author: Bédorf Jeroen
Gaburov Evghenii
Zwart Simon Portegies
Publication venue
Publication date: 01/01/2015
Field of study

Astrophysical direct

N

-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2

N

-body library, which allows researchers to use the GPU for

N

-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for OpenCL, multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles (

N < 100

) is integrated. This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of OpenCL support the library is also able to run on CPUs and other accelerators that support OpenCL.Comment: 15 pages, 7 figures. Accepted for publication in Computational Astrophysics and Cosmolog

arXiv.org e-Print Archive

Springer - Publisher Connector

Leiden University Scholary Publications

The present day mass function in the central region of the Arches cluster

Author: Chen Hui-Chen
Gaburov Evghenii
Gürkan M. Atakan
Zwart Simon Portegies
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

We study the evolution of the mass function in young and dense star clusters by means of direct N-body simulations. Our main aim is to explain the recent observations of the relatively flat mass function observed near the centre of the Arches star cluster. In this region, the power law index of the mass function for stars more massive than about 5-6 solar mass, is larger than the Salpeter value by about unity; whereas further out, and for the lower mass stars, the mass function resembles the Salpeter distribution. We show that the peculiarities in the Arches mass function can be explained satisfactorily without primordial mass segregation. We draw two conclusions from our simulations: 1) The Arches initial mass function is consistent with a Salpeter slope down to ~1 solar mass, 2) The cluster is about half way towards core collapse. The cores of other star clusters with characteristics similar to those of the Arches are expected to show similar flattening in the mass functions for the high mass (>5 solar mass) stars.Comment: 6 pages with 6 figures and 1 table. Submitted to the letters section of MNRAS. Incorporates changes following suggestions by the refere

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

International Migration, Integration and Social Cohesion online publications

On the Onset of Runaway Stellar Collisions in Dense Star Clusters - I. Dynamics of the First Collision

Author: Gaburov Evghenii
Gualandris Alessia
Zwart Simon P.
Publication venue: RIT Scholar Works
Publication date: 05/12/2007
Field of study

We study the circumstances under which first collisions occur in young and dense star clusters. The initial conditions for our direct N-body simulations are chosen such that the clusters experience core collapse within a few million years, before the most massive stars have left the main-sequence. It turns out that the first collision is typically driven by the most massive stars in the cluster. Upon arrival in the cluster core, by dynamical friction, massive stars tend to form binaries. The enhanced cross section of the binary compared to a single star causes other stars to engage the binary. A collision between one of the binary components and the incoming third star is then mediated by the encounters between the binary and other cluster members. Due to the geometry of the binary-single star engagement the relative velocity at the moment of impact is substantially different than in a two-body encounter. This may have profound consequences for the further evolution of the collision product

arXiv.org e-Print Archive

RIT Scholar Works

International Migration, Integration and Social Cohesion online publications

A sparse octree gravitational N-body code that runs entirely on the GPU processor

Author: Barnes
Barnes
Barnes
Belleman
Billeter
Buck
Burtscher
de Berg
Dehnen
Dubinski
Evghenii Gaburov
Fukushige
Gaburov
Gaburov
Hamada
Hamada
Harfst
Hut
Jeroen Bédorf
Knuth
Lauterbach
Makino
Makino
McMillan
Nyland
Plummer
Portegies Zwart
Portegies Zwart
Raman
Salmon
Satish
Simon Portegies Zwart
Springel
Warren
Yokota
Publication venue: 'Elsevier BV'
Publication date: 01/04/2012
Field of study

We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Author: Aarseth
Anderson
Belleman
Dorband
Evghenii Gaburov
Fernando
Fernando
Ford
Gualandris
Harfst
Harfst
Heggie
Makino
Makino
Nitadori
Plummer
Portegies Zwart
Portegies Zwart
Portegies Zwart
Simon Portegies Zwart
Stefan Harfst
Sussman
van Meel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small (< 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications